29 research outputs found

    Statistical Models for Co-occurrence Data

    Get PDF
    Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms

    Mixture Models for Co-occurrence and Histogram Data

    No full text
    Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution, we develop a general statistical framework for analyzing co-occurrence data based on probabilistic clustering by mixture models. More specifically, we discuss three models which pursue different modeling goals and which differ in the way they define the probabilistic partitioning of the observations. Adopting the maximum likelihood principle, annealed EM algorithms are derived for parameter estimation. From the class of potential applications in pattern recognition and data analysis, we have chosen document retrieval, language modeling, and unsupervised texture segmentation to test and evaluate the proposed algorithms. 1. Introduction The type of data investigated in this paper is best described by the term co-occurrence data (COD). The general setting is as follows: Suppose two finite sets X = {x 1 ,...,x N } and Y = {y 1 ,...,y M } of abstract object..

    Shape Context: A new descriptor for shape matching and object recognition

    No full text
    We introduce a new shape descriptor, the shape context, for correspondence recovery and shape-based object recognition. The shape context at a point captures the distribution over relative positions of other shape points and thus summarizes global shape in a rich, local descriptor. Shape contexts greatly simplify recovery of correspondences between points of two given shapes. Moreover, the shape context leads to a robust score for measuring shape similarity, once shapes are aligned. The shape context descriptor is tolerant to all common shape deformations. As a key advantage no special landmarks or key-points are necessary. It is thus a generic method with applications in object recognition, image registration and point set matching. Using examples involving both handwritten digits and 3D objects, we illustrate its power for object recognition
    corecore